A Knowledge-Gradient Policy for Sequential Information Collection

نویسندگان

  • Peter I. Frazier
  • Warren B. Powell
  • Savas Dayanik
چکیده

In a sequential Bayesian ranking and selection problem with independent normal populations and common known variance, we study a previously introduced measurement policy which we refer to as the knowledge-gradient policy. This policy myopically maximizes the expected increment in the value of information in each time period, where the value is measured according to the terminal utility function. We show that the knowledge-gradient policy is optimal both when the horizon is a single time period and in the limit as the horizon extends to infinity. We show furthermore that, in some special cases, the knowledge-gradient policy is optimal regardless of the length of any given fixed total sampling horizon. We bound the knowledge-gradient policy’s suboptimality in the remaining cases, and show through simulations that it performs competitively with or significantly better than other policies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consistency of Sequential Bayesian Sampling Policies

We consider Bayesian information collection, in which a measurement policy collects information to support a future decision. This framework includes ranking and selection, continuous global optimization, and many other problems in sequential experimental design. We give a sufficient condition under which measurement policies sample each measurement type infinitely often, ensuring consistency, ...

متن کامل

Information Collection on a Graph

We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure)...

متن کامل

Asymptotic Optimality of Sequential Sampling Policies for Bayesian Information Collection

We consider adaptive sequential sampling policies in a Bayesian framework. Under the assumptions that the sampling distribution is from an exponential family and that the number of distinct measurement types is finite, we give sufficient conditions for an adaptive sampling policy to achieve asymptotic optimality. Here, asymptotic optimality is understood to mean that the limit of the expected l...

متن کامل

Convergence to Global Optimality with Sequential Bayesian Sampling Policies

We consider Bayesian information collection, in which a measurement policy collects information to support a future decision. This framework includes problems in ranking and selection, reinforcement learning, and continuous global optimization. We give sufficient conditions under which measurement policies achieve asymptotically minimal expected loss. Achieving asymptotically minimal expected l...

متن کامل

Finite-time Analysis for the Knowledge-Gradient Policy

We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective of interpreting Bayesian ranking and selection problems as adaptive stochastic multi-set maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • SIAM J. Control and Optimization

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2008